library(mosaic)
library(car)
library(plotly)
library(pander)
library(readxl)
library(DT)
PayrollWins <- read_excel("../Data/PayrollWins2.xlsx")
One of America’s great Past times has always been baseball. Since being created in the 1800s, baseball has consistently been a huge spectator sport. Now having 30 Major League Baseball teams, the MLB is still growing. Something commonly disputed in the MLB is their use of payroll monitoring. Unlike other professional sports leagues in the US, MLB uses a luxury tax in place a salary cap. This Luxury tax began back in 1998, when smaller market teams argued that the larger market teams that had more money could afford spending more on players. The Luxury tax was implemented to even the playing field somewhat. The question raised is if payroll really has a significant impact on a team’s performace? So much of a team is also based on their minor league systems. Is limiting the Payroll even worth it?
We can use the following hypotheses to scope our testing.
\[ \left.\begin{array}{ll} H_0: \beta_1 = 0 \\ H_a: \beta_1 \neq 0 \end{array} \right\} \ \text{Slope Hypotheses} \]
\[ \left.\begin{array}{ll} H_0: \beta_0 = 0 \\ H_a: \beta_0 \neq 0 \end{array} \right\} \ \text{Intercept Hypotheses} \]
Keys for the data:
Year: year of the season of baseball
Team: MLB team
Payroll: Total Payroll the team pays for its players.
RankPayroll: Where the team ranks out of all 30 teams as far as payroll goes
W: The amount of wins the team has earned that regular season
L: The amount of losses the team has suffered that regular season
WPct: The Wins the team has earned over the total amount of games played that year(The MLB regular season has 162 games)
RankWin: Where the team ranks out of the 30 teams based on their Winning Percentage.
Attendance: Total amount of people attending games in the entire regular season
AttRank: Where each MLB team ranks out of all 30 teams with attendance
SeasonAVG: The average payroll of all 30 teams for that season
Team Season: Team’s Payroll divided by the season average of all teams. this is to give a better perception of how much each team is paying compared to every other team.
datatable(PayrollWins,options = list(lengthMenu=c(5,10,30,100)))
The following plot has been designed to help visualize the linear relationship of the data.
winslm<-lm(W~TeamSeason, data=PayrollWins)
plot_ly(PayrollWins, x=~TeamSeason, y=~W, type='scatter', mode='markers', text=~paste(Year, Team)) %>%
add_lines(y=fitted(winslm), line=list(color='indigo'))%>%
layout(title="The Payroll's Effect on a Team's Performance",xaxis = list(title='Proportion to Season Average Payroll'), yaxis=list(title='Amount of Wins'))
par(mfrow=c(1,2))
plot(winslm, which=1:2)
The above Normality tests were created to find if the assumptions of simple linear regression are met.
From these tests of the assumptions, we can assume that some assumptions might not be met. The QQ Plot shows that the distribution of the error terms is approximately normal. There is an apparent linear relationship and the variance is relatively consistent.
Our test concluded the following results presented in this table:
pander(summary(winslm))
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 68.66 | 1.163 | 59.02 | 1.364e-251 |
| TeamSeason | 12.25 | 1.081 | 11.33 | 4.334e-27 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 600 | 10.47 | 0.1767 | 0.1753 |
According to the p-value, there is sufficient evidence to conclude that there is a linear relationship in this data. The estimated relationship is as follows: \[ Y_i= 68.66 + 12.25X \]
We can assume that there is a linear relationship between Wins and the payorll of the team. When one has the average payroll, they will have around 81 wins. Doubling the average payroll will get them to 93. Understanding that in order to make it to the playoffs a team normally needs over 90 wins, we would recommend spending more than average if a team wants to compete.